On the number of elements to reorder when updating a suffix array
نویسندگان
چکیده
Recently new algorithms appeared for updating the Burrows-Wheeler transform or the suffix array, when the text they index is modified. These algorithms proceed by reordering entries and the number of such reordered entries may be as high as the length of the text. However, in practice, these algorithms are faster for updating the Burrows-Wheeler transform or the suffix array than the fastest reconstruction algorithms. In this article we focus on the number of elements to be reordered for real-life texts. We show that this number is related to LCP values and that, on average, Lave entries are reordered, where Lave denotes the average LCP value, defined as the average length of the longest common prefix between two consecutive sorted suffixes. Since we know little about the LCP distribution for real-life texts, we conduct experiments on a corpus that consists of DNA sequences and natural language texts. The results show that apart from texts containing large repetitions, the average LCP value is close to the one expected on a random text.
منابع مشابه
Fast Algorithms for Learning with Long N-grams via Suffix Tree Based Matrix Multiplication
This matrix format is inefficient when storing frequency data since we know all entries in x are non-negative integers. Moreover, the number of bits needed to store each index in the jc array is dlog2 nze which can be significantly larger than dlog2 Ue where U is the largest number of non-zero elements in any column. Our modified CSC format simply replaces the jc array with an integer array of ...
متن کاملA Comparison Study on Various Finite Element Models of Riveted Lap Joint by the Use of Dynamic Model Updating
Till now, various models have been proposed in literature to simulate the behavior of riveted structures. In order to find the most accurate analytical method in modeling the dynamic behavior of riveted structures, a comparison study is performed on several of these models, in this research. For this purpose, experimental modal analysis tests are conducted on a riveted plate to verify the effic...
متن کاملContracted Suffix Trees: A Simple and Dynamic Text Indexing Data Structure
We address the problem of finding the locations of all instances of a string P in a text T , where of T is allowed to facilitate the queries. Previous data structures for this problem include the suffix tree, the suffix array, and the compact DAWG. We modify a data structure called a sequence tree, which was proposed by Coffman and Eve for hashing, and adapt it to the new problem. We can then p...
متن کاملPosition heaps: A simple and dynamic text indexing data structure
We address the problem of finding the locations of all instances of a string P in a text T , where preprocessing of T is allowed in order to facilitate the queries. Previous data structures for this problem include the suffix tree, the suffix array, and the compact DAWG. We modify a data structure called a sequence tree, which was proposed by Coffman and Eve for hashing [1], and adapt it to the...
متن کاملA Thinning Method of Linear And Planar Array Antennas To Reduce SLL of Radiation Pattern By GWO And ICA Algorithms
In the recent years, the optimization techniques using evolutionary algorithms have been widely used to solve electromagnetic problems. These algorithms use thinning the antenna arrays with the aim of reducing the complexity and thus achieving the optimal solution and decreasing the side lobe level. To obtain the optimal solution, thinning is performed by removing some elements in an array thro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Discrete Algorithms
دوره 11 شماره
صفحات -
تاریخ انتشار 2012